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PRE-APPEAL BRIEF REQUEST FOR REVIEW 

SIR: 

Claims 43-46 are pending. Claims 43-46 are rejected under 35 
U.S.C. §1 02(e) over U.S. patent no. 6,424,980 (lizuka et al.). This 
rejection is respectfully traversed. 

This application is directed to extracting of data records from 
structured text, such as a web page or any text-containing file, without prior 
knowledge of the structure of the text. The invention deduces the structure 
of the text by using information about the attributes and knowledge of 
candidate structures. (Specification at page 3, lines 9-15.) The claims 
recite how this is accomplished; in other words, what is claimed is not that 
data records are extracted without prior knowledge of the structure, but 
how that is effected. 

lizuka et al. represent the prior art referenced by applicants in their 
"Description of Related Art". On page 1 , line 20, to page 2, line 7, 
applicants state: 
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Traditional techniques for solving this information 
gathering problem are typically based on knowledge of the 
structure used to arrange data within each specific website. 
(The structure used to arrange the data within a page is 
commonly referred to as the syntax of the page.) These 
techniques require prior determination of the syntax of each 
page and storage of syntax information about each page in a 
data storage device, such as a database. 

When gathering information about a subject from a 
particular page, the traditional techniques identify the 
attributes of the subject by comparing the structure of the 
page with the stored structure information. When there is a 
match, the traditional technique returns the attribute value to 
the user. 

These traditional techniques are limited because they 
can only gather attribute values from a page when they know 
the syntax of a page. To put it differently, the traditional 
techniques can only gather attribute values when the syntax 
of a page has been previously determined and stored. 

Correspondingly, lizuka et al. state: 

The apparatus has a HTML document storing unit for 
storing meta data about HTML documents. That meta data 
includes the locations, document structures, presentation 
locations, presentation styles, etc., of the HTML documents 
for each HTML document.... The document structure data of 
the HTML documents specifies the structures of partial 
structure such as tables, lists and clauses contained in the 
HTML documents and is used to map element data in the 
table and lists to items to be extracted. 

(Col. 1 1 , line 63, to col. 1 2, line 5) 
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In the preparatory phase, a managing person prepares 
meta data about HTML documents through the HTML 
document meta data manager before starting the execution 
phase. 
(Col. 14, lines 30-32). 

In other words, lizuka et al. require the syntax of documents that are 
to be searched to be known and stored before a search can be conducted. 

The Examiner asserted that this argument is not persuasive 
because applicants' claims do not recite that data records within a file are 
identified without using prior knowledge of the structure (e.g., syntax) of the 
file. This assertion misses the point of applicants' argument. Applicants' 
argument explains why a teaching of how data records within a document 
may be identified without knowledge of the structure of a file is not found in 
lizuka et al.: since Izuka et al. know the structure a-priori from pre-stored 
meta-data (see, lizuka et al. Figs. 12 and 13, and col. 14, lines 17-21), they 
do not need to identify it, and consequently they do not teach how it may 
be identified. In contrast, and as was pointed out above, applicants' claims 
recite how this identification (and consequent record extraction) is done. 
Applicants are relying on the functionality - the particular steps that are 
recited in the claims - to distinguish their invention from lizuka et al. lizuka 
et al. do not disclose, teach, or suggest that functionality. 

Inter alia, applicants' claims recite "identifying potential locations of 
values of record fields in [a structured] text by identifying locations in the 
text of items in lists of known potential values for record fields." The 
Examiner asserted that "lizuka discloses identifying potential locations of 
values of record fields in the text in Figure 8 at reference signs S200." The 
Examiner is mistaken. This step of Fig. 8 refers to determining the 
addresses of documents that are to be searched, in an HTML document 
table that stores the locations of HTML documents -- see col. 14, lines 15- 
17 and 47-51. In contrast, the claim language refers to identifying locations 
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of known potential values for record fields, within a document (text) that is 
being searched. 

The Examiner dismissed this argument by stating that "the examiner 
equates list of known potential values for record fields' with 'HTML 
document table/" But applicants' argument cannot be dismissed so easily. 
The issue is what is being identified, lizuka et al. identify addresses of 
documents that are to be searched, whereas the claims search a text to 
identify therein potential values (i.e., "items in lists of known potential 
values") for record fields. The things that are being identified in lizuka et 
al. and in applicants' claims are unmistakably different. 

Applicants' claims further recite "identifying a region of interest in the 
text by applying multiple candidate region partitioned, evaluating each to 
measure how well it isolates a region with a high density and a high 
amount of potential locations of values of record fields, selecting one that 
measures best, and applying it to produce a region of interest." The 
Examiner asserted that "lizuka discloses identifying a region of interest in 
the text by applying candidate region partitions and segmenting the region 
of interest into record regions that contain data for a single record," and 
pointed to lizuka et al.'s description of Ashish and Knoblock's technique at 
col. 2, lines 45-65, as supporting this assertion. The Examiner is again 
mistaken. This technique identifies the regions (the internal structure) of a 
text (document). But it does not identify a region of interest among the 
regions of the text, as required by applicants' claims. 

Undaunted, the Examiner dismissed this argument by asserting that 
"lizuka discloses This technique considers a portion in HTML document as 
meaningful information' (column 2, line 50, emphasis added). The 
examiner equates 'a region of interest' with 'portion in HTML document as 
meaningful information.'" The Examiner's assertion misses the mark. The 
statement in lizuka et al. that "this technique considers a portion in HTML 
document as meaningful information" merely means that portions of an 
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HTML document are not meaningless - in other words, that portions, and 
not only the document as a whole, have meaning. Thus, this statement 
provides a rationale for why someone would want to determine what 
portions (regions) a text has. But, significantly, it does not teach identifying 
a region of interest among those regions, as required by the claims. Nor 
does it teach identifying the region by the applying, evaluating, and 
selecting that are recited in the claims. Nor does it teach segmenting the 
region of interest (as opposed to the text as a whole), as required by the 
claims. Nor does it effect segmentation by the applying, evaluating, 
selecting, applying, and extracting that are recited in the claims. 

It should therefore by evident that, contrary to the Examiner's 
assertion, lizuka et al. do not disclose, teach, or suggest identifying a 
region of interest in the text as that identifying is recited in the claims. Nor 
do they disclose the recited segmenting 

For the reasons stated above, Applicants request that the Section 
102(e) rejection of their claims over lizuka et al. be reversed. 

Respectfully submitted, 

Eric T. Bax 
Charles C. Fowlkes 
Louis Cisnero, Jr. 

David Volejnicek 
Corporate Counsel 
Reg. No. 29355 
303-538-4154 

Date: /'9'## 

Avaya Inc. 

Docket Administrator 

307 Middletown-Lincroft Road 

Room 1 N-391 

Lincroft, NJ 07738 
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